1. Apache Kylin Installation Guide

	1.1 Download Link

		Kylin URL: 				http://apache.forsale.plus/kylin/apache-kylin-2.2.0/apache-kylin-2.2.0-bin-hbase1x.tar.gz
		Kylin ODBC Driver URL: 	http://kylin.apache.org/download/KylinODBCDriver-2.1.0.zip

	1.2 login http://localhost:4200
	
		cd /root/TrainingOnHDP
		wget http://apache.forsale.plus/kylin/apache-kylin-2.2.0/apache-kylin-2.2.0-bin-hbase1x.tar.gz
		tar -xvf apache-kylin-2.2.0-bin-hbase1x.tar.gz
	

	1.3 Export KYLIN_HOME to point to kylin package folder

		export KYLIN_HOME=/root/TrainingOnHDP/apache-kylin-2.2.0-bin
		export KAFKA_HOME=/usr/hdp/2.6.3.0-235/kafka
	
	1.4 Start HBase service
	
	1.5 Add the following to /root/TrainingOnHDP/apache-kylin-2.2.0-bin/tom/conf/kylin.properties
	
			kylin.server.cluster-servers=localhost:8744
		
		Make the following change on /root/TrainingOnHDP/apache-kylin-2.2.0-bin/tomcat/conf/server.xml
		
			Connector port="7070" to Connector port="8744"

		Make the following change on /root/TrainingOnHDP/apache-kylin-2.2.0-bin/sample_cube/template/kafka/DEFAULT.KYLIN_STREAMING_TABLE.json
		
			"host": "localhost" to "host": "sandbox-hdp.hortonworks.com"
			"port": 9092 to "port": 6667
			
	1.6 Go to the folder /root/TrainingOnHDP/apache-kylin-2.2.0-bin/bin
	
		Run the following to check the setup:
			
			./check-env.sh
		
		Start kylin server	
			
			./kylin.sh start
		
		Stop kylin server
			
			./kylin.sh stop
		
	1.7 Open logs/kylin.log	to check whether or not the kylin server is started properly

	1.8 Login kylin console: http://localhost:8744 as the user/password (ADMIN/KYLIN) - user and password are case sensitive

	
2. Start with Sample Cubes
	
	2.1 Go to the folder /root/TrainingOnHDP/apache-kylin-2.2.0-bin/bin, Run the following script to create the sample cube
	
		./sample.sh
	
	2.2 Go to kylin web console to refresh the metadata (system tab -> reload metadata) or restart kylin server if refresh metadata doesn't work.
	
		After reloading the metadate, you will see one new project called "learn_kylin" if you click "manage project" button. 
		
	2.3	Run the cube "kylin_sales_cube" 
	
		Expand that project, there is one cube called "kylin_sales_cube", click the detail link to open the cube, under "actions" drop box, choose "build", 
		then put the start date as 2014-01-01 and the end date as current date, then click "submit"
	
	2.4 Go to Monitor tab to check the build status until 100% is done
	
	2.5 If you are experiencing OutOfMemory issue, then open /root/TrainingOnHDP/apache-kylin-2.2.0-bin/conf/kylin_hive_conf.xml, add the following:
	
		<property>
			<name>hive.tez.java.opts</name>
			<value>"-server -Xmx2048m -Djava.net.preferIPv4Stack=true"</value>
		</property>
		
	2.6 Under Insight section, run the following query:
	
		select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt
	
		It will take less than one second to get the query result back, but in hive, probably 15 seconds
		
	2.7 Verify the query result

	2.8 Login Ambari console and start kafka service

	2.9 Open /root/TrainingOnHDP/apache-kylin-2.2.0-bin/bin/sample-streaming.sh, make the following change:

		localhost:9092 to sandbox-hdp.hortonworks.com:6667
	
	2.10 Go to the folder /root/TrainingOnHDP/apache-kylin-2.2.0-bin/bin, Run the following script to create the sample cube
	
		./sample-streaming.sh
		
	2.11 Run the cube "kylin_streaming_cube" 
	
		Expand that project, there is one cube called "kylin_streaming_cube", click the detail link to open the cube, under "actions" drop box, choose "build", 
		then click "submit"
		
	2.12 Go to Monitor tab to check the build status until 100% is done

	2.13 If run into some Map Reduce issue, make the following change on /root/TrainingOnHDP/apache-kylin-2.2.0-bin/conf/kylin_job_conf_inmem.xml to reduce 
		the memory request to YARN:
		
		<property>
			<name>mapreduce.map.memory.mb</name>
			<value>2072</value>
			<description></description>
		</property>

		<property>
			<name>mapreduce.map.java.opts</name>
			<value>-Xmx1700m -XX:OnOutOfMemoryError='kill -9 %p'</value>
			<description></description>
		</property>
		
	2.14 Under Insight section, run the following query:
	
		select count(*), HOUR_START from kylin_streaming_table group by HOUR_START
	
	2.15 Verify the query result
		

3. Build Cube with Spark

	3.1 In order to enable spark cubing, you must set kylin.env.hadoop-conf-dir to a dir which contains at least core-site.xml, hdfs-site.xml, hive-site.xml,
		mapred-site.xml and yarn-site.xml
		
		Login your sandbox localhost:4200
		
		export KYLIN_HOME=/root/TrainingOnHDP/apache-kylin-2.2.0-bin
		mkdir $KYLIN_HOME/hadoop-conf
		ln -s /etc/hadoop/conf/core-site.xml $KYLIN_HOME/hadoop-conf/core-site.xml
		ln -s /etc/hadoop/conf/hdfs-site.xml $KYLIN_HOME/hadoop-conf/hdfs-site.xml
		ln -s /etc/hadoop/conf/yarn-site.xml $KYLIN_HOME/hadoop-conf/yarn-site.xml
		ln -s /etc/hbase/conf/hbase-site.xml $KYLIN_HOME/hadoop-conf/hbase-site.xml
		ln -s /etc/hive/conf/hive-site.xml $KYLIN_HOME/hadoop-conf/hive-site.xml
		
		vi $KYLIN_HOME/hadoop-conf/hive-site.xml (change "hive.execution.engine" value from "tez" to "mr", don't forgot change back to "tez" after you finish this lab)
		vi $KYLIN_HOME/conf/kylin.properties (add kylin.env.hadoop-conf-dir=/root/TrainingOnHDP/apache-kylin-2.2.0-bin/hadoop-conf)
	
		vi $KYLIN_HOME/conf/kylin.properties, uncomment the following for HDP
		
			kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
			kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current	
			kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current

		Restart kylin server
		
	3.2 Login kylin console and click Model, choose 'kylin_sales_cube', under actions, click Edit, keep clicking Next button until Advanced Setting page, switch 
		Cube Engine from MapReduce to Spark, then save
		
	3.3 Build 'kylin_sales_cube', the same as the section 1.3
	
	3.4 Goto YARN (ResourceManager UI), you will see the application with the name similar to "Cubing for:kylin_sales_cube segment e50daa73-e126-41ca-a9a7-0bc3879dd5fe" is running
	
	
4. MS Excel and KYLIN

	4.1 Add the following to /root/TrainingOnHDP/apache-kylin-2.2.0-bin/conf/kylin.properties

		kylin.server.mode=all
		
	4.2 Download Kylin ODBC Driver from http://kylin.apache.org/download/KylinODBCDriver-2.1.0.zip

		Choose KylinODBCDriver.exe (depends on 64 or 32 bits) to install
		
	4.3 Run C:\Windows\SysWOW64\odbcad64.exe

	4.4 Choose System DSN and add KylinODBCDriver

		DSN Name:			KylinDSN
		Server Host:		localhost
		Port:				8744
		Username:			ADMIN
		Password:			KYLIN
		
	4.5 Open Excel, Data->From Other Sources->From Microsoft Query

	4.6 Choose KylinDSN and add one table
	
		
5. Build Sales Batch Cube

	5.1 chmod 755 -R /root/TrainingOnHDP/SalesBatchOnKylin/lab

	5.2 run the following command to deploy the cube	

		/root/TrainingOnHDP/SalesBatchOnKylin/lab/sales_cube/sales.sh

	5.3 After deployment, either refresh the metadata or restart kylin

	5.4 Choose one of the options to build the cube
	
		5.4.1 Run the following command:
		
			/root/TrainingOnHDP/SalesBatchOnKylin/lab/oozie/kylin_cube_build.sh
			
		5.4.2 Setup and launch Oozie job to build use case 1 cube

			5.4.2.1 Start up oozie service

			5.4.2.2 Launch the Project
	
				/root/TrainingOnHDP/SalesBatchOnKylin/lab/oozie/oozie_cube.sh

	5.5 Login kylin console, go to Monitor tab, and choose "sales" project and "sales_cube" cube to check the build status until 100% is done
				
	5.6 Run the following query from kylin inqight query to verify your cube
	
		select sales.store_id, store_city, count(*) as total_transactions, sum(store_sales) as total_sales, max(store_sales) as max_transaction_sales from sales join store on sales.store_id = store.store_id
		group by sales.store_id, store_city
		order by sales.store_id

		select sales.store_id, brand_name, count(*) as total_transactions, sum(store_sales) as total_sales, max(store_sales) as max_transaction_sales from sales join product on sales.product_id = product.product_id
		group by sales.store_id, brand_name
		order by sales.store_id

		select store_city, brand_name, count(*) as total_transactions, sum(store_sales) as total_sales, max(store_sales) as max_transaction_sales from sales join product on sales.product_id = product.product_id join store on sales.store_id = store.store_id
		group by store_city, brand_name
		order by store_city, brand_name
	
		select brand_name, count(*) as total_transactions, sum(store_sales) as total_sales, max(store_sales) as max_transaction_sales from sales join product on sales.product_id = product.product_id
		group by brand_name
		order by brand_name
	
		select brand_name, store_city,  count(*) as total_transactions, sum(store_sales) as total_sales, max(store_sales) as max_transaction_sales from sales join product on sales.product_id = product.product_id join store on sales.store_id = store.store_id
		group by brand_name, store_city
		order by brand_name, store_city

		select store_id, customer_id, sum(store_sales) from sales group by store_id, customer_id
	
	5.5 Run the following command from your SSH command line to verify your cube

		curl -X POST -u ADMIN:KYLIN -H "Content-Type: application/json" -d '{ "sql":"select sales.store_id, store_city, count(*) as total_transactions, sum(store_sales) as total_sales, max(store_sales) as max_transaction_sales from sales join store on sales.store_id = store.store_id group by sales.store_id, store_city order by sales.store_id", "project":"sales" }' http://localhost:8744/kylin/api/query
	
		
	
6. Build Stocks Streaming Cube

	6.1 Start up kafka service

	6.2 SSH to sandbox and run the following command to create kafka topic

		export KYLIN_HOME=/root/TrainingOnHDP/apache-kylin-2.2.0-bin
		export KAFKA_HOME=/usr/hdp/2.6.3.0-235/kafka
	
		/usr/hdp/2.6.3.0-235/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic kylinstock
   
	6.3 Yarn Memory should be more than 3072MB

	6.4 Setup nifi, kafka and google finance to build use case 2 streaming cube

		6.4.1 chmod 755 -R /root/TrainingOnHDP/StocksStreamingOnKylin/lab
	
		6.4.2 Run the following command to deploy the cube	

			/root/TrainingOnHDP/StocksStreamingOnKylin/lab/stocks_cube/stocks.sh

		6.4.3 After deployment, either refresh the metadata or restart kylin
 
		6.4.4 Here is the command to trigger the streaming cube build manually
 
			curl -X PUT --user ADMIN:KYLIN -H "Content-Type: application/json;charset=utf-8" -d '{ "sourceOffsetStart": 0, "sourceOffsetEnd": 9223372036854775807, "buildType": "BUILD"}' http://localhost:8744/kylin/api/cubes/streaming_stocks_cube/build2

		6.4.5 Deploy the NIFI workflow and start workflow   

	6.5 Login Kylin Console (http://localhost:8744/kylin) as ADMIN/KYLIN, Goto Insight page and run the following query:

		select E as exchange, T as ticker, count(*) as transaction, max(L_CUR) as max_price, min(L_CUR) as min_price from streaming_stocks_table group by E, T
